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Abstract 

Auctions play an important role in electronic commerce, and have been used to solve problems in 
distributed computing. Automated approaches to designing effective auction mechanisms are helpful in 
reducing the burden of traditional game theoretic, analytic approaches and in searching through the large 
space of possible auction mechanisms. This paper presents an approach to automated mechanism design 
(AMD) in the domain of double auctions. We describe a novel parameterized space of double auctions, 
and then introduce an evolutionary search method that searches this space of parameters. The approach 
evaluates auction mechanisms using the framework of the TAC Market Design Game and relates the per- 
formance of the markets in that game to their constituent parts using reinforcement learning. Experiments 
show that the strongest mechanisms we found using this approach not only win the Market Design Game 
against known, strong opponents, but also exhibit desirable economic properties when they run in isolation. 



1 Introduction 

In the Internet era, ecommerce has grown and flourished. The greater amount of available information, 
the lower cost of communication, and other reductions in economic friction makes the world 'flatter' than 
ever before, promoting automated marketplaces and the adoption of autonomous agents in ecommerce ifTJl 
H6l . In financial markets, traders have continuously turned to automated algorithmic trading services to 
deal with faster transactions and more complex market dynamics [31]. According to an article from The 
Economist [6|, algorithmic trading accounted for a third of all share trades in the United States, and Aite 
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Group, a consultancy, reported that the figure will reach 50% by 2010. News and events usually affect 
market predictions and lead to high price volatility, which in turn creates opportunities for arbitrage between 
markets. Unpredictable dynamics and complex linkages between markets make more robust, efficient market 
mechanisms very desirable. 

Online auction sites like eBay provide a way for consumers to buy a wide range of items. Since its 
establishment in 1995, eBay has expanded into dozens of countries and now makes billions of dollars each 
year. The auction mechanisms used by eBay and other successful auction sites however are not perfect. For 
example, an eBay auction typically finishes at a fixed time, allowing a bidder to bid only moments before 
the auction terminates and steal a deal from bidders who would offer higher prices if given the chance lfl2l . 
This means a loss of revenue for both sellers and eBay. Another issue, and one to which many researchers 
have paid much attention, is that eBay runs many simultaneous sequential auctions ll9l [T5ll . In other words, 
on eBay, hundreds, even thousands, of on-going auctions may sell the same kind of goods. It is difficult for a 
potential buyer to select an auction that would result in the lowest winning bid. As a result, a successful bid 
in one auction may be lower than a failed bid in another, leading to complaints from both sellers and bidders, 
lower efficiency of the auctions and, in time, less revenue for eBay. 

Electronic auctions have also been used to sell things that are not goods in a traditional sense. For 
example, search engines like Google and Yahoo, in the role of publisher, typically use auctions to select 
and show relevant advertisements along with search results on their web sites. For each keyword-based 
search query, an ad auction is run to select bids from advertisers. Each selected advertiser provides an ad to 
display in one of a certain number of ad positions on the search result page. Better positions, which draw 
more attention from users, are allotted to advertisers that bid higher. An advertiser usually pays on a per- 
click basis rather than on the per-impression basis in the traditional media. Publishers have commonly used 
variants of an auction mechanism called the generalized second-priced auction to determine winning bids 
from advertisers and their ad positions. Although ad auctions generate many dollars in income each year for 
these companies, how to analyze the current practices and design more effective ad auction mechanisms is 
still a major concern. For instance, Lahaie and Pennock [18| compared the ranking rule used by Yahoo — 
based on the prices of bids — and that used by Google — based on the expected profits of bids to Google, 
and concluded that neither rule consistently outperforms the other. 

All these scenarios from e-commerce challenge the designers of electronic auction mechanism to design 
more desirable mechanisms. This opens up new lines of research in computer science, such as inventing new 
algorithms for deciding the winning bid in auctions 11191 , deciding how best to bid in multiple auctions |30|, 
and how to build the software infrastructure to run such auctions |23 1. 

The Internet also significantly boosts the adoption of distributed computing, in particular agent-based 
computing. A major tool for multi-agent system designers has been game theory (GT). GT provides a frame- 
work for studying strategic, interacting individuals and solution concepts — usually various equilibria — 
with the assumption of the rationality of individuals. GT thus helps to compare the outcomes of an interaction 
mechanism to the optimal ones in theory, but it does not give a dynamic model that explains how to reach 
optimal outcomes, nor presents much guidance on how to maximize global outcomes when some agents in 
the system are not as rational as presumed. 

Auction mechanisms are an ideal candidate to provide this missing model. The effectiveness of auction 
mechanisms in the real world and the similarity between an auction and a multi-agent system — both in- 
volving multiple self-interested individuals and concerning certain global outcomes — have led to various 
market-based approaches to multi-agent coordination and resource allocation problems in cluster and grid 
computing environments [14, 32, 38 1. These approaches have demonstrated superior performance than those 
pre-existing, non-market solutions, in terms of a combination of performance, scalability, and reliability. 
However, the market mechanisms adopted by these approaches are usually selected arbitrarily or based on 
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certain heuristics. It is unknown whether these market mechanisms are optimal solutions, or whether there 
are better options. 

2 A new approach 

2.1 Automated mechanism design 

Facing the challenges in both electronic commerce and market-based control, we need to solve the following 
problem: Given a certain set of restrictions and desired outcomes, how can we design a good, if not optimal, 
auction mechanism; or when the restrictions and goals alter, how can the current mechanism be improved to 
handle the new scenario? 

The traditional answer to this question has been in the domain of auction theory ifTTll . A mechanism is 
designed by hand, analyzed theoretically, and then revised as necessary. The problems with the approach are 
exactly those that dog any manual process — it is slow, error-prone, and restricted to just a handful of indi- 
viduals with the necessary skills and knowledge. In addition, there are classes of commonly used mechanism, 
such as the double auctions that we discuss here, which are too complex to be analyzed theoretically, at least 
for interesting cases l35l . 

Automated mechanism design (AMD) aims to overcome the problems of the manual process by designing 
auction mechanisms automatically. AMD considers design to be a search through some space of possible 
mechanisms. For example, Cliff [2] and Phelps et al. Il27ll28l explored the use of evolutionary algorithms to 
optimize different aspects of the continuous double auction. Around the same time, Conitzer and Sandholm 
|4| were examining the complexity of building a mechanism that fitted a particular specification. 

These different approaches were all problematic. The algorithms that Conitzer and Sandholm considered 
dealt with exhaustive search, and naturally the complexity was exponential. In contrast, the approaches that 
Cliff and Phelps et al. pursued were computationally more appealing, but gave no guarantee of success and 
were only searching tiny sections of the search space for the mechanisms they considered. As a result, one 
might consider the work of Cliff and Phelps et al, and indeed the work we describe here, to be what Conitzer 
and Sandholm [5 1 call "incremental" mechanism design, where one starts with an existing mechanism and 
incrementally alters parts of it, aiming to iterate towards an optimal mechanism. Similar work, though work 
that uses a different approach to searching the space of possible mechanisms has been carried out by l34l and 
has been applied to several different mechanism design problems [29 1 . 

The problem with taking the automated approach to mechanism design further is how to make it scale — 
though framing it as an incremental process is a good way to look at it, it does not provide much practical 
guidance about how to proceed. Our aim in this paper is to provide more in the way of practical guidance, 
showing how it is possible to build on a previous analysis of the most relevant components of a complex 
mechanism in order to set up an automated mechanism design problem, and then describing one approach to 
solving this problem. 

2.2 CAT games 

We set our work within the context of the Trading Agent Competition Market Design game, also known as 
the CAT game. This competition, which ran for the last three years, asks entrants to design a market for a 
set of automated traders which are based on standard algorithms for buying and selling in a double auction, 
including ZI-C 1111 . ZIP J5], RE (7), and GD [ 10]. The game is broken up into a sequence of days, and each 
day every trader picks a market to trade in, using a market selection strategy that models the situation as an 



3 



n-armed bandit problem [33, Section 2]. Markets are allowed to charge traders in a variety of ways and are 
scored based on the number of traders they attract (market share), the profits that they make from traders 
(profit share), and the number of successful transactions they broker relative to the total number of shouts 
placed in them (transaction success rate). Full details of the game can be found in 0]. 

We picked the CAT game as the basis of our work for four main reasons. First, the double auctions that 
are the focus of the design are a widely used mechanism. Second, the competition is run using an open- 
source software package called JCAT which is a good basis for implementing our ideas. Third, after three 
years of competition, a number of specialists have been made available by their authors, giving us a library of 
mechanisms to test against. Fourth, there have been a number of publications that analyze different aspects 
of previous entrants, giving us a good basis from which to start searching for new mechanisms. 

2.3 Towards a grey-box approach 

Particularly helpful is the prior work of Niu et al. l22l |2D . Il22l is an analysis of the 2007 CAT competi- 
tion, which identifies a number of different components of the double auction market, along with different 
implementations of these components proposed by the game entrants. ET1 complements this analysis with 
a description of a large number of simulations of competitions between the specialists from the 2007 CAT 
game, systematically identifying how different specialists perform both in multi-market games, and in games 
between pairs of specialists. Together these two papers mirror the black-box and white-box analyses from 
software engineering. Il22l provides a white-box analysis, looking inside each market in order to identify 
which components it contains, and relating the performance of each market to the operation of its compo- 
nents. [21 ] provides a black-box analysis, which ignores the detail of the internal components of each market, 
but providing a much more extensive analysis of how the specialists perform. 

These analyses make a good combination for examining the strengths and weaknesses of specialists. The 
white-box approach is capable of relating the internal design of a strategy to its performance and revealing 
which part of the design may cause vulnerabilities, but it requires internal structure and involves manual ex- 
amination. The black-box approach does not rely upon the accessibility of the internal design of a strategy. It 
can be applied to virtually any strategic game, and is capable of evaluating a design in many more situations. 
However, the black-box approach tells us little about what may have caused a strategy to perform poorly 
and provides little in the way of hints as to how to improve the strategy. It is desirable to combine these two 
approaches in order to benefit from the advantages of both. Following the GA-based approach to trading strat- 
egy acquisition and auction mechanism design in [2, 26, 28 1, we propose what we call a grey -box approach 
to automated mechanism design that solves the problem of automatically creating a complex mechanism by 
searching a structured space of auction components. 

In other words, we concentrate on the components of the mechanisms (as in the white-box approach), but 
take a black-box view of the components, evaluating their effectivenesses by looking at their performance 
against that of their peers. 

More specifically, we view a market mechanism as a combination of auction rules, each as an atomic 
building block. We consider the problem: how can we find a combination of rules that is better than any 
known combination according to a certain criterion, based on a pool of existing building blocks? The black- 
box analysis in |21 1 maintains a population of strategies and evolves them generation by generation based 
on their fitnesses. Here we intend to follow a similar approach, maintaining a population of components 
or building blocks for strategies, associating each block with a quality score, which reflects the fitnesses of 
auction mechanisms using this block, exploring the part of the space of auction mechanisms that involves 
building blocks of higher quality, and keeping the best mechanisms we find. 
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3 Grey-box AMD 



Having sketched our approach at a high level, we now look in detail at how it can be applied in the context of 
the CAT game. 

3.1 A search space of double auctions 

The first issues we need to address are what composite structure is used to represent auction mechanisms? 
and where can we obtain a pool of building blocks? 

Viewing an auction as a structured mechanism is not a new idea. Wurman et al. Il37ll introduced a 
conceptual, parameterized view of auction mechanisms. Niu et al. El extended this framework for auction 
mechanisms competing in CAT games and provided a classification of entries in the first CAT competition 
that was based on it. The extended framework includes multiple intertwined components, or policies, each 
regulating one aspect of a market. We adopt this framework, include more candidates for each type of policy 
and take into consideration parameters that are used by these policies. 

These policies are either inferred from the literature [20 1, or from our previous work l2"T1 l22l l24l . or 
contributed by entrants to the CAT competitions. These policies, each as a building block, form a solid 
foundation for the grey-box approach. 

Figure [T] illustrates the building blocks as a tree structure which we describe after we review the blocks 
themselves. We describe the different types of policies in details below and discuss how we search the space 
based on the tree structure in the next section. 

3.1.1 Matching policy 

Matching policies, denoted as M in Figure [T] define how a market matches shouts made by traders. Equi- 
librium matching (ME) is the most commonly used matching policy 11201 [361 . The offers made by traders 
form the reported demand and supply, which is usually different from the underlying demand and supply, 
and are determined by traders' private values and unknown to the market, since traders are assumed to be 
profit-seeking and make offers deviating from their private values. ME clears the market at the reported 
equilibrium price and matches intra-marginal asks (offers to sell) with intra-marginal bids (offers to buy) — 
with an intersecting demand and supply, the shouts on the left of the intersection (the equilibrium point) and 
their traders are called intra-marginal since they can be matched and make profit, while those on the right 
are called extra-marginal. It is worth mentioning that a shout, or a trader, that appears to be intra-marginal 
or extra-marginal in the reported demand and supply may not be so in the underlying demand and supply. 
Max-volume matching (MV) aims to increase transaction volume based on the observation that a high intra- 
marginal bid can match with a lower extra-marginal ask, though with a profit loss for the buyer. It does so 
to realize the maximal transaction volume that is possible. A generic, parameterized, matching policy can be 
defined to include ME and MV as two special cases. This policy, denoted as MT, uses a parameter, 9, which 
can be any value in [—1, 1]. When 9 is — 1, MT does not match any shout; when 9 is 0, MT becomes ME; 
and when 9 is 1, MT becomes MV. For any other values of 9, MT tries to realize a transaction volume that is 
proportional to and those realized in ME and MV. 

3.1.2 Quote policy 

Quote policies, denoted as Q in Figure[T] determine the quotes issued by markets. Typical quotes are ask and 
bid quotes, which respectively specify the upper bound for asks and the lower bound for bids that may be 
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placed in a quote-driven market. Two-sided quoting (QT) defines the ask quote as the minimum of the lowest 
tentatively matchable bid and lowest unmatchable ask, and defines the bid quote as the the maximum of the 
highest tentatively matchable ask and highest unmatchable bid. One-sided quoting (QO) is similar to QT, but 
considers only the standing shouts closest to the reported equilibrium price from the unmatchable side. When 
the market is cleared continuously (see below), QO is identical to QT, but otherwise forms a possibly looser 
restriction on placing shouts. Spread-based quoting (QS) extends QT to maintain a higher ask quote and a 
lower bid quote for use with MV. With QS, when the ask quote is lower than the bid quote, the former is set 
somewhere above their average and the latter below the average, and the spread between the two is a fixed 
value. QS helps relax the constraint put on shouts with too low an ask quote and too high a bid quote. 

3.1.3 Shout accepting policy 

Shout accepting policies, denoted as A in Figure[T[ judge whether a shout made by a trader should be permitted 
in the market. Always accepting (AA) accepts any shout, and never accepting (AN) does the opposite. Quote- 
beating accepting (AQ) allows only those shouts that are more competitive than the corresponding market 
quote. This has been commonly used in both experimental settings and real stock markets, and is sometimes 
called the "New York Stock Exchange rule" since that market adopts it. Self-beating accepting (AS) accepts 
all first-time shouts but only allows a trader to modify its standing shout with a more competitive price. 
Equilibrium-beating accepting (AE) learns an estimate of the equilibrium price based on the past transaction 
prices in a sliding window, and requires bids to be higher than the estimate and asks to be lower. AE uses a 
parameter, w, to specify the size of the sliding window in terms of the number of transactions, and a second 
parameter, 8, which can be added to the estimate to relax the restriction on shouts. This policy was suggested 
in [24 1 and found to be effective in reducing transaction price fluctuation and increasing allocative efficiency 
in markets populated by ZI-C traders. A variant of AE, denoted as AD and introduced by the PSUCAT team 
in the first CAT competition, uses the standard deviation of transaction prices in the sliding window rather 
than a constant 8 to relax the restriction on shouts. History-based accepting (AH) is derived from the GD 
trading strategy [10] and reported to be a crucial component of one particular strong market mechanism for 
CAT games l2D . GD computes how likely a given offer is to be matched, based on the history of previous 
shouts, and AH uses this to accept only shouts that will be matched with probability no lower than a specified 
threshold, z € [0, 1]. Transaction-based accepting (AT) tracks the most recently matched asks and bids, and 
uses the lowest matched bid and the highest matched ask to restrict the shouts to be accepted. In a clearing 
house (ch) [8 1, the two bounds are expected to be close to the estimate of equilibrium price in AE, while in a 
continuous double auction (CDA), AT may produce much looser restriction since extra-marginal shouts may 
steal a deal. Shout type-based accepting (AY) allows shouts based merely on their types, i.e. asks or bids. 
This mimics the continuum of auctions presented in Q, including retailer markets where only sellers shout, 
procurement auctions where only buyers shout, as well as general double auctions. 

3.1.4 Clearing condition 

Clearing conditions, denoted as C in Figure [T[ define when to clear the market and execute transactions 
between matched asks and bids. Continuous clearing (CC) attempts to clear the market whenever a new 
shout is placed. Round clearing (CR) clears the market after all traders have submitted their shouts. This was 
the original clearing policy in NYSE, but was replaced by CC later for faster transactions and higher volumes. 
With CC, an extra-marginal trader may have more chance to steal a deal and get matched. Probabilistic 

The name follows 1201 since either quote depends on information on both the ask side and the bid side. 
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clearing (CP) clears the market with a predefined probability, p, whenever a shout is placed. It thus defines a 
continuum of clearing rules with CR (p = 0) and CC (p = 1) being the two ends. 

3.1.5 Pricing policy 

Pricing policies, denoted as P in Figure [T] set transaction prices for matched ask-bid pairs. The decision 
making may involve only the prices of the matched ask and bid, or more information including market 
quotes. Discriminatory k-pricing (PD) sets the transaction price of a matched ask-bid pair at some point in 
the interval between their prices. The parameter k <E [0, 1] controls which point is used and usually takes 
value 0.5 to avoid a bias in favor of buyers or sellers. Uniform k-pricing (PU) is similar to PD, but sets the 
transaction prices for all matched ask-bid pairs at same point between the ask quote and the bid quote. A 
transaction price set by PU may or may not fall into the range between the matched ask and bid, depending 
upon the matching policy and the quote policy in the auction mechanism, When it falls outside, whichever 
of the ask and the bid is closer to the computed transaction price will be used as the final transaction price. 
n-pricing (PN) was introduced in [24 J, and sets the transaction price as the average of the latest n pairs of 
matched asks and bids. If the average falls out of the price interval between the ask and bid to be matched, 
the nearest end of the interval is used. This policy can help reduce transaction price fluctuation and has 
little impact on allocative efficiency. Side-biased pricing (PB) is basically PD with an internal k dynamically 
adjusted so as to split the profit in favor of the side on which fewer shouts exist. Thus the more that asks 
outnumber bids in the current market, the closer k is set to 0. 

3.1.6 Charging policy 

Charging policies, denoted as G in Figure [T] determine the charges imposed by a market. This is typically 
not an issue in research on auctions in isolation, but would affect the selection of markets by traders directly 
in an environment of multiple competing markets, as in CAT games. Fixed charging (GF) sets charges at a 
specified fixed level. Bait-and-switch charging (GB) makes a market cut its charges until it captures a certain 
market share, and then slowly increases charges to increase profit. It will adjust its charges downward again if 
its market share drops below a certain level. Charge-cutting charging (GC) sets the charges by scaling down 
the lowest charges imposed on the previous day, based on the observation that traders prefer markets with 
lower charges. Learn-or-lure-fast charging (GL) adapts charges towards some target following the scheme 
used by the ZIP trading strategy [3 1. If the market using this policy believes that the traders are still exploring 
among markets and have yet to find a good one to trade, the market would adapt charges towards to lure 
traders to join and stay; otherwise it learns from the charges of the most profitable market. GL uses an 
exploring monitor component to determine whether traders are exploring or not. A simple exploring monitor, 
for example, examines the daily distribution of market shares of specialists. If the distribution is flat, the 
traders are considered exploring, and not otherwise. This is based on the observation that traders all tend 
to go to the best market and cause an imbalanced distribution. To implement this, the degree of flatness 
of the distribution — the standard deviation of the distribution relative to the mean of the distribution — is 
compared to a threshold % £ [0, 1]. And similar to ZIP, GL uses a learning rate parameter, r, to control how 
fast the market adapts its charges. 

All these charging policies require an initial set of fees on different activities, including fee on registration, 
fee on information, fee on shout, fee on transaction, and fee on profit, denoted as f r , /,, f s , /,, and f p 
respectively in Figure [T] 
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3.1.7 A tree model 



The tree model of double auctions in Figure [T] illustrates how building blocks are selected and assembled 
level by level. There are and nodes, or nodes, and leaf nodes in the tree. An and node, rounded and filled, 
combines a set of building blocks, each represented by one of its child nodes, to form a compound building 
block. The root node, for example, is an and node to assemble policies, one on each aspect described above, 
to obtain a complete auction mechanism. An or node, rectangular and filled, represents the decision making 
of selecting a building block from the candidates represented by the child nodes of the or node based on 
their quality scores. This selection occurs not only for those major aspects of an auction mechanism, i.e. 
M, Q, A, P, C, and G (at G's child node of 'policy' in fact), but also for minor components, for example, a 
learning component for an adaptive policy (as what Phelps et al. does regarding a trading strategy |26|), and 
for determining optimal values of parameters in a policy, like 9 in MT and k in PD. A leaf node represents an 
atomic block that can either be for selection at its or parent node or be further assembled into a bigger block 
by its and parent node. A special type of leaf node in Figure[T]is that with a label in the format of [x,y\. Such 
a leaf node is a convenient representation of a set of leaf nodes that have a common parent — the parent of 
this special leaf node — and take values evenly distributed between x and y for the parameter labeled at the 
parent node. 

or nodes contribute to the variety of auction mechanisms in the search space and are where exploitation 
and exploration occur. We model each or node as an n-armed bandit learner that chooses among candidate 
blocks, and use the simple softmax method 11331 Section 2.3] to solve this learning problem. The same solution 
is adopted in designing the market selection strategy for trading agents in CAT games |25| . 2 

3.2 The Grey-Box-AMD algorithm 

Given a set of building blocks, B, and a set of fixed markets, FM, as targets to beat, we define the skeleton of 
the grey-box algorithm below: 

Grey-Box-AMD(B,FM) 

1 HOF^{} 

2 for S <— 1 tO NUM_DF_STEPS 

3 do G <- Create-GameQ 

4 SM <- {} 

5 for tn i — 1 to num_of_samples 

6 do M <r- Create-Market() 

7 for f 1 tO NUM_DF_PDLICYTYPES 

8 doB <- Select (B f ,l) 

9 Add-Block(M,B) 

10 SM^SMU{M} 

1 1 EM «- S ELECT (HOF, NUM_0F_H0F_SAMPLES)) 

12 Run-Game(G,FMuEMuSM) 

1 3 for each M in EM U SM 

14 do Update-Market-Score(M,Score(G,M)) 

15 if M not in HOF 

16 then HOF HOF U {M} 

2 However the two scenarios may need different parameter values. The market selection scenario should favor choices that give a good 
profit — a cumulative measure — while here we require effective exploration to find a good mechanism in the foreseeable future — a 
one-time concern. 
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17 if CAPACITY_OF_HOF < |HOF| 

18 then HOF <- HOF - {Worst-Market(HOF)} 

19 for each B used by M 

20 do Update-Block-Score(B,Score(G,M)) 

21 return HOF 



The Grey-Box- AMD algorithm runs a certain number of steps (num_of_steps in Line 2). At each step, 
a single CAT game is created (Create-Game() in Line 3) and a set of markets are prepared for the game. 
This set of markets includes all markets in FM, a certain number (num_of_samples in Line 5) of markets 
sampled from the search space, denoted as SM, and a certain number (num_df_hof_samples in Line 11) of 
markets, denoted as EM, chosen from a Hall of Fame, HOF. All these markets are put into the game, which 
is run to evaluate the performance of these markets (Run-Game(G, FMUEMUSM) in Line 12). HOF has 
a fixed capacity, capacity_of_hof, and maintains markets that performed well in games at previous steps 
in terms of their average scores across games they participated. HOF is empty initially, updated after each 
game, and returned in the end as the result of the grey-box process. 

Each market in SM is constructed based on the tree model in Figure [T] After an 'empty' market mech- 
anism, M, is created (Create-Market() in Line 6), building blocks can be incorporated into it (Add- 
Block(M,B) in Line 9, where B G B). num_of_policytypes in Line 7 defines the number of different 
policy types, and from each group of policies of same type, denoted as B r where t specifies the type, a build- 
ing block is chosen for M (Select(B,, 1) in Line 8). For simplicity, this algorithm illustrates only what 
happens to the or nodes at the high level, including M, Q, A, C, and P. Markets in EM are chosen from HOF 
in a similar way (SELECT(HOF, num_of_hof_samples) in Line 1 1). 

After a CAT game, G, completes at each step, the game score of each participating market M E SM U 
EM, Score(G, M), is recorded and the game-independent score of M, SCORE(M), is updated (Update- 
Market-Score(M, Score(G, M)) in Line 14). If M is not currently in HOF and SCORE(M) is higher than 
the lowest score of markets in HOF, it replaces that corresponding market (WORST-MARKET(HOF) in Line 
18). 

Score(G, M) is also used to update the quality score of each building block used by M (Update- 
Block-Score(B, Score(G, M)) in Line 20). Both Update-Market-Score and Update-Block- 
Score calculate respectively game-independent scores of markets and quality scores of building blocks 
by averaging feedback Score(G, M) over time. Because choosing building blocks occurs only at or nodes in 
the tree, only child nodes of an or node have quality scores and receive feedback after a CAT game. Initially, 
quality scores of building blocks are all 0, so that the probabilities to choose them are even. As the exploration 
proceeds, fitter blocks score higher and are chosen more often to construct better mechanisms. 



4 Experiments 

This section describes the experiments that are carried out to acquire auction mechanisms using the grey-box 
approach. 



4.1 Experimental setup 

We extended JCAT with the parameterized framework of double auctions and all the individual policies de- 
scribed in Section 3.1 To reduce the computational cost, we eliminated the exploration of charging policies 
by focusing on mechanisms that impose a fixed charge of 10% on trader profit, which we denote as GFq.i. 
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Analysis of CAT games 12TI and what entries have typically charged in actual CAT competitions, especially 
in the latest two events, suggest that such a charging policy can be a reasonable choice to avoid losing ei- 
ther intra-marginal or extra-marginal traders. Even with this cut-off, the search space still contains more 
than 1,200,000 different kinds of auction mechanisms, due to the variety of policies on aspects other than 
charging and the choices of values for parameters. 

The experiments that we ran to search the space each last 200 steps. At each step, we sample two auction 
mechanisms from the space, and run a CAT game to evaluate them against four fixed, well known, mechanisms 
plus two mechanisms that performed well at previous steps and are members of the Hall of Fame. The scores 
of the sampled and Hall of Fame mechanisms are used as feedback for every building block that an individual 
mechanism uses and is associated with a quality score. 

To sample auction mechanisms, the softmax exploration method used by or nodes starts with a relatively 
high temperature so as to explore randomly, then gradually cools down, and eventually maintains a temper- 
ature that guarantees a non-negligible probability of choosing even the worst action any time. After all, our 
goal in the grey-box approach is not to converge quickly to a small set of mechanisms, but to explore the 
space as broadly as possible and avoid being trapped in local optima. 

The fixed set of four markets in every CAT game includes two CH markets — CH/ and CH/, — and two 
CDA markets — CDA/ and CDA/, — with one of each charging 10% on trader profit, like GFo.i does, and the 
other charging 100% on trader profit (denoted as GFi.q). CH and CDA mechanisms are two common double 
auctions and have been used in the real world for many years, in financial marketplaces in particular due to 
their high allocative efficiency. Earlier experiments we ran, involving CH and CDA markets against entries into 
CAT competitions, indicate that it is not trivial to win over these two standard double auctions. Markets with 
different charge levels are included to avoid any sampled mechanisms taking advantage otherwise. Based on 
the parameterized framework in Section [3~Tj the CH and CDA markets can be represented as follows: 

CH/ / CH/, = ME + QT + AQ + CR + PU/. =0 .5 + GF .i / GF L0 
CDA/ / CDA/, = ME + QT + AQ + CC + PD i=0 .5 + GF .i / GFi.q 

The Hall of Fame that we maintain during the search contains ten 'active' members and a list of 'inactive' 
members. After each CAT game, the two sampled mechanisms are compared with those active Hall of Famers. 
If the score of a sampled mechanism is higher than the lowest average score of the active Hall of Famers, the 
sampled mechanism is inducted into the Hall of Fame and replaces the corresponding Hall of Famer, which 
becomes inactive and ineligible for CAT games at later steps. An inactive Hall of Famer may be reactivated if 
an identical mechanism happens to be sampled from the space again and scores high enough to promote its 
average score to surpass the lowest score of active Hall of Famers. 

Each CAT game is populated by 120 trading agents, using ZI-C, ZIP, RE, and GD strategies, a quarter of the 
traders using each strategy. Half the traders are buyers, half are sellers. The supply and demand schedules are 
both drawn from a uniform distribution between 50 and 150. Each CAT game lasts 500 days with ten rounds 
for each day. This setup is similar to that of actual CAT competitions except for a smaller trader population 
that helps to reduce computational costs. A 200-step grey-box experiment takes around sixteen hours on a 
WINDOWS PC that runs at 2.8GHz and has a 3GB memory. 

4.2 Experimental results 

We carried out four experiments to check whether the grey-box approach is successful in searching for good 
auction mechanisms. 

First, we measured the performance of the mechanisms that are being generated indirectly, through their 
effect on other mechanisms. Since the four standard markets participate in all the CAT games, their perfor- 
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(a) The four fixed auction mechanisms. (b) The top ten active Hall of Famers. 



Figure 2: Scores of market mechanisms over 200 steps during the grey-box process. 



mance over time reflects the strength of their opponents — they will do worse as their opponents get better - 



which in turn reflects whether the search generates increasingly better mechanisms. Figure 2a shows that the 
scores of the four markets (more specifically the average daily scores of the markets in a game) decrease over 
200 games, especially over the first 100 games, suggesting that the mechanisms we are creating get better as 
the learning process progresses. 

Second, we measured the performance of the set of mechanisms we are creating more directly. The 
mechanisms that are active in the Hall of Fame at a given point represent the best mechanisms that we know 
about at that point and their performance tells us more directly how the best mechanisms evolve over time 
Figure 2b shows the scores of the ten active Hall of Famers at each step over a 200-step run. 3 As in Figure 



2a 



the first 100 steps sees a clear, increasing trend. Note that even the scores of the worst of the ten at the end 



are above 0.35, higher than the highest of the four fixed markets from Figure 2a Thus we know that our 
approach will create mechanisms that outperform standard mechanisms, though we should not read too much 
into this since we trained our new mechanisms directly against them. 

Third, a better test of the new mechanisms is to run them against those mechanisms that we know to 
be strong in the context of CAT games, asking what would have happened if our Hall of Fame members 
had been entered into prior CAT competitions and had run against the carefully hand-coded entries in those 
competitions. We chose three Hall of Famers, which are internally labeled as SM7.1, SM88.0, and SM127.1 



and can be represented in the parameterized framework in Section 3.1 as follows: 



SM7. 1 = MV + QO + AH T=0 .4 + CP p=0 .3 + PN„ = n + GF .i 
SM88 . = MT e=0 .4 + QT + AA + CP p= o.4 + PU k=0 ,j + GF .i 
SM127 . 1 = MV + QS + AS + CP p=0 .4 + PU^oj + GF .i 

We ran these three mechanisms against the best recreation of past CAT competitions that we could achieve 
given the contents of the TAC agent repository, 4 where competitors are asked to upload their entries after 



mechanisms we know of. 



^http : //www. sics . se/tac/ showagents .php 
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Table 1 : The scores of markets in CAT games including the best mechanisms from the grey-box approach and 
entries in prior CAT competitions, averaged over three CAT games respectively. 



(a) Against CAT 2007 entries. (b) Against CAT 2008 entries. 



Market 


Score 


SD 


Market 


Score 


SD 


SM7.1 


199.4500 


5.9715 


SM7.1 


196.7240 


9.2843 


SM88 . 


191.1083 


10.3186 


SM88 . 


186.9247 


4.2184 


SM127.1 


180.1277 


9.0289 


SM127.1 


183.5887 


9.7835 


MANX 


154.6953 


1.3252 


jackaroo 


177.5913 


2.5722 


CrocodileAgent 


142.0523 


9.0867 


Mertacor 


161.5440 


5.8741 


TacTex 


138.4527 


5.8224 


MANX 


147.3050 


15.7718 


PSUCAT 


133.1347 


5.6565 


IAMwildCAT 


142.9167 


8.9581 


PersianCat 


124.3767 


11.2409 


PersianCat 


139.1553 


17.9783 


jackaroo 


108.8017 


8.6851 


DOG 


130.2197 


18.9782 


IAMwildCAT* 


106.8897 


4.4006 


MyFuzzy 


125.9630 


1.9221 


Mertacor 


89.1707 


4.9269 


CrocodileAgent 


71.4820 


5.8687 








PSUCAT* 


68.3143 


6.7389 



* IAMwildCAT from CAT 2007, and CrocodileAgent and PSUCAT from CAT 2008 worked abnormally during 
the games and tried to impose invalid fees, probably due to competitions from the three new, strong op- 
ponents. Although we modified JCAT to avoid kicking out these markets on those trading days when they 
impose invalid fees — which JCAT does in an actual CAT tournament — these markets still perform poorly, 
in contrast to their rankings in the tournaments. 



the competition. There were enough entries in the repository at the time we ran the experiments to create 
reasonable facsimiles of the 2007 and 2008 competitions, but there were not enough entries from the 2009 
competition for us to recreate that year's competition. The CAT games were set up in a similar way to the 
competitions, populated by 500 traders that are evenly split between buyers and sellers and between the four 
trading strategies — ZI-C, ZIP, RE, and GD — and the private values of sellers or buyers were drawn from 
a uniform distribution between 50 and 150. For each recreated competition, we ran three games, like in the 
actual competitions. 

Table [T] lists the average cumulative scores of all the markets across their three games along with the 
standard deviations of those scores. The three new mechanisms we obtained from the grey-box approach 
beat the actual entries in the competition by a comfortable margin in both cases. The fact that we can take 
mechanisms that we generate in one series of games (against the fixed opponents and other new entries) and 
have them perform well against a separate set of mechanisms suggests that the grey-box approach learns 
robust mechanisms. 

In passing, we note that the rankings of the entries from the repository do not reflect those in the actual 
CAT competitions. This is to be expected since the entries now face much stronger opponents and different 
markets will, in general, respond differently to this. Excluding the markets that attempt to impose invalid fees 
and are marked with '*', we can see that the overall performance of entries into the 2008 CAT competition 
is better than that of those into the 2007 CAT competition when they face the three new, strong, opponents, 
reflecting the improvement in the entries over time. 
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Table 2: Economic properties of the best mechanisms from the grey-box experiments and the auction mecha- 
nisms explored in ll24l . All NCDAEE mechanisms are configured to have w = 4 in their AE policies and n = 4 
in their PN policies. The best result in each column is shaded. Data in the first four row are averaged over 
1,000 runs and those in the last four are averaged over 100 runs. 



ZI-C GD 
Market Eg a Eg a 

Mean SD Mean SD Mean SD Mean SD 



CDA 




97.464 


3.510 


13.376 


4.351 


99.740 


1.553 


4.360 


3.589 


NCDAEE 5= 


=0 


98.336 


3.262 


4.219 


3.141 


9.756 


28.873 


14.098 


1.800 


NCDAEE 5= 


= 10 


98.912 


2.605 


5.552 


2.770 


23.344 


41.727 


7.834 


5.648 


NCDAEE^ = 


=20 


98.304 


2.562 


7.460 


3.136 


89.128 


30.867 


4.826 


3.487 


NCDAEE^ = 


=30 


97.708 


3.136 


8.660 


3.740 


99.736 


1.723 


4.498 


3.502 



SM7.1 99.280 1.537 4.325 2.509 58.480 47.983 4.655 4.383 

SM88.0 98.320 2.477 11.007 4.251 99.920 0.560 4.387 2.913 

SM127.1 97.960 3.225 11.152 4.584 99.520 1.727 4.751 3.153 



Finally, we tested the performance of SM7 . 1, SM88 . 0, and SM127 . 1 when they are run in isolation, applying 
the same kind of test that auction mechanisms are traditionally subject to. We tested the mechanisms both for 
allocative efficiency and, following [24 J, for the extent to which they trade close to theoretical equilibrium as 
measured by the coefficient of convergence, a. Niu et al. [24] compared a class of double auctions, called 
NCDAEE, which can be represented as: 

NCDAEE = ME + AE M , 5 + CC + PN„ 

The advantage of NCDAEE is that it can give significantly lower a — faster convergence of transaction prices 
— and higher allocative efficiency (E a ) than a CDA when populated respectively by homogeneous ZI-C traders 
and can perform comparably to a CDA when populated by homogeneous GD traders. 

We replicated these experiments using JCAT and ran additional ones for the three new mechanisms with 
similar configurations. The results of these experiments are shown in Table|] 5 The best result in each column 
is shaded. We can see that both cases of SM7 . 1 with ZI-C traders and SM88 . with GD traders give higher E a 
than the best of the existing markets respectively, and both of these increases are statistically significant at 
the 95% level. Both cases also lead to low a, not the lowest in the column but close to the lowest, and the 
differences between them and the lowest are not statistically significant at the 95% level. Thus the grey-box 
approach can generate mechanisms that perform as well in the single market scenario as the best mechanisms 
from the literature. 



5 Conclusions and future work 

This paper describes a practical approach to the automated design of complex mechanisms. The approach 
that we propose breaks a mechanism down into a set of components each of which can be implemented in 

Our results are slightly different from those in 1241 . but the pattern of these results still holds. In addition, we ran an NCDAEE variant (5 — 30) that was not 
tested in 1241 . observing that those with 8 < 20 do not perform well when populated by GD traders. 
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a number of different ways, some of which are also parameterized. Given a method to evaluate candidate 
mechanisms, the approach then uses machine learning to explore the space of possible mechanisms, each 
composed from a specific choice of components and parameters. The key difference between our approach 
and previous approaches to this task is that the score from the evaluation is not only used to grade the 
candidate mechanisms, but also the components and parameters, and new mechanisms are generated in a 
way that is biased towards components and parameters with high scores. 

The specific case-study that we used to develop our approach is the design of new double auction mecha- 
nisms. Evaluating the candidate mechanisms using the infrastructure of the TAC Market Design competition, 
we showed that we could learn mechanisms that can outperform the standard mechanisms against which 
learning took place and the best entries in past Market Design competitions. We also showed that the best 
mechanisms we learnt could outperform mechanisms from the literature even when the evaluation did not 
take place in the context of the Market Design game. These results make us confident that we can generate 
robust double auction mechanisms and, as a consequence, that the grey-box approach is an effective approach 
to automated mechanism design. 

Now that we can learn mechanisms effectively, we plan to adapt the approach to also learn trading strate- 
gies, allowing us to co-evolve mechanisms and the traders that operate within them. 
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